# Image-Text Fusion
Meta Llama Llama 4 Maverick 17B 128E Instruct
Other
Llama 4 Maverick is a multimodal AI model released by Meta, supporting text and image understanding. It adopts a Mixture of Experts (MoE) architecture and excels in multilingual text and code generation tasks.
Multimodal Fusion
Transformers Supports Multiple Languages

M
Undi95
35
2
Liquid V1 7B
MIT
Liquid is an autoregressive generation paradigm that achieves seamless fusion of visual understanding and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens in a shared feature space.
Text-to-Image
Transformers English

L
Junfeng5
11.35k
84
Pixtral Large Instruct 2411
Other
Pixtral-Large-Instruct-2411 is a multimodal instruction fine-tuned model based on MistralAI technology, supporting image and text input with multilingual processing capabilities.
Image-to-Text
Transformers Supports Multiple Languages

P
nintwentydo
23
2
Featured Recommended AI Models